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Abstract 

This paper discusses a methodology for determining a functional representation of a random 
process from a collection of pointwise samples. This situation typically arises when experimental 
data are used to estimate some quantities in a form suitable for their subsequent use in a numer- 
ical simulation as initial and/or boundary conditions. The present work specifically focuses onto 
random quantities lying in a high dimensional stochastic space in the context of limited amount 
of information. The proposed approach involves a procedure for the selection of an approximation 
basis and the evaluation of the associated coefficients. The selection of the approximation basis 
relies on the a priori choice of the High-Dimensional Model Representation format combined 
with a modified Least Angle Regression technique. The resulting basis then provides the struc- 
ture for the actual approximation basis, possibly using different functions, more parsimonious 
and nonlinear in its coefficients. To evaluate the coefficients, both an alternate least squares and 
an alternate weighted total least squares methods are employed. Examples are provided for the 
approximation of a random variable in a high-dimensional space as well as the estimation of a 
random field. Stochastic dimensions up to 100 are considered, with an amount of information 
as low as about 3 samples per dimension, and robustness of the approximation is demonstrated 
w.r.t. noise in the dataset. The computational cost of the solution method is shown to scale only 
linearly with the cardinality of the a priori basis and exhibits a {N q ) s , 2 < s < 3, dependence 
with the number N q of samples in the dataset. The provided numerical experiments illustrate the 
ability of the present approach to derive an accurate approximation from scarce experimental 
data even in the presence of noise. 
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1 Introduction 



With the growing available computational power, and as more efficient numerical methods 
become available, domains as diverse as engineering, chemistry, psychometrics, medicine, 
finance or social sciences, now heavily rely on simulation for the prediction of more and 
more complex phenomena, often combining multi-models and high accuracy requirement. 
The prediction capability of modern simulations is often such that a new bottle-neck 
for accuracy has emerged from the lack of relevant boundary and/or initial conditions 
(BICs). Indeed, boundary and initial conditions are often poorly known and have to be 
estimated or modeled. This introduces modeling errors which often constitute the main 
source of lack of accuracy in the simulation chain. This situation has triggered a renewed 
interest for stochastic modeling where it is explicitly accounted for uncertainty in the 
model. With probabilistic BICs, the output of the simulation is a probabilistic quantity 
which interpretation and comparison with real data is left to the final user. The BICs may 
sometimes be modeled from first principles but are often approximated in a functional form 
involving a set of influencing parameters and identified from experimental measurements. 
However, more often than not, only relatively few measurements are available, in particular 
when a significant number of parameters is of influence so that representing the BICs takes 
the form of a high-dimensional approximation problem. 

If the random process, which output is to be represented in closed-form, is driven by 
known equations, efficient techniques may be used to determine its representation. In the 
specific case of high- dimensional quantities, tensor-based representations have proved to 
be effective when applicable. In particular, low-rank approximations based on an a pri- 
ori chosen separated representation can be efficiently derived, see Nouy (2007, 2010a, b); 
Matthies & Zander (2012) in the context of uncertainty quantification (UQ). If a closed- 
form model description of the process at hand is not available, one is typically left with 
approximating it from a finite collection of instances, hereafter termed samples. When the 
process is known only from a closed numerical code used as a black-box or if measure- 
ments can be made arbitrarily (design of experiments), some properties of approximation 
theory can be exploited. For instance, measurements may be taken at some particular lo- 
cations in the parameter space, possibly associating a weight to them, so that the random 
Quantity of Interest (Qol) can be represented in the retained approximation basis with 
good accuracy using (sparse) quadrature techniques, Novak & Ritter (1999); see also Xiu 
& Hesthaven (2005) for an application to UQ. Anisotropy in the Qol may be exploited by 
biasing the quadrature weights, Nobile et al. (2007); Ganapathysubramanian & Zabaras 
(2007); Ma & Zabaras (2010). In Doostan & Iaccarino (2009), an Alternate Least Squares 
(ALS) technique to estimate the coefficients has been considered with samples lying on a 
tensor-product grid. Another situation of design of experiment arises in importance sam- 
pling where the Markov- Chain Monte Carlo algorithm requires a new sample at a specific 
proposed location. This control over the samples usually brings efficiency and allows to 
approximate a reasonably behaved Qol with accuracy. 

A different situation occurs when the data are provided 'as is', with no ability to choose 
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the set of samples nor to add a measurement. This is a common situation, typically arising 
when measurements come from a past experiment or are costly to acquire so that new 
samples cannot be taken. In this context, one has to resort to a regression-based approach 
and the coefficients of the approximation are then solution of an optimization problem. 
This type of approach was considered in Choi et al. (2004); Berveiller et al. (2006). 

In the present work, the focus is specifically put on deriving a closed-form approxima- 
tion of a high-dimensional quantity of interest from a small, uncontrolled, collection of its 
samples. No governing equation is available and the dataset of samples is the only piece 
of information one can rely on. This requires to determine an approximation basis finely 
tuned to the data at hand and an efficient way of evaluating the associated coefficients. 
To this aim, we rely on the fact that, as a counterpart of the curse of dimensionality 
associated with high-dimensional problems, real applications often reward with a bless- 
ing of dimensionality. Indeed, in many cases, the Qol can be well approximated in a 
low-dimensional subspace of the solution space, sometimes involving orders of magnitude 
fewer degrees-of- freedom. This typically occurs when the solution exhibits some degree 
of sparsity in the retained functional space. Efficient techniques have been proposed in 
the recent past to take advantage of this and essentially consist in matching the approx- 
imation with the observational data while promoting a sparse coefficient set. This class 
of methods work well in many different contexts and have been recently applied to the 
UQ framework, Doostan & Owhadi (2011); Mathelin & Gallivan (2012). These techniques 
rely on the Compressed Sensing theory, e.g., Candes & Tao (2004a); Donoho (2006), and 
may seem well suited for the present problem as they promote a low cardinality approx- 
imation of the Qol. However, they require to handle a potentially huge representation 
basis, or dictionary, and associated optimization problem, leading to severe memory and 
computation limitations in the present high- dimensional context. 

In this paper, we explore a solution method combining the strength of different techniques, 
taking advantage of the sparsity of the representation in a suitable basis and allowing 
an efficient approximation of a well-behaved multivariate function with a low number 
of degrees-of- freedom hence compatible with a small experimental dataset. The driving 
principle is first to consider a tight approximation basis based on a priori knowledge on the 
Qol at hand and to rely on the available data to further refine it. In a nutshell, an initial 
approximation basis is first considered in the High-Dimensional Model Representation 
format (HDMR, Rabitz & Ah§ (1999); Ah§ & Rabitz (2001)), assuming it is suitable for 
representing the Qol. This initial basis is referred to as a priori, or proposal, basis. Next, 
available data are used to refine it by retaining only its most relevant basis functions 
through a constructive subset selection procedure based on a modification of the Least 
Angle Regression approach proposed in Efron et al. (2004). This determines a skeleton, 
or final approximation basis, and the associated coefficients are then evaluated with an 
alternate least squares technique. The solution method allows to approximate random 
variables as well as random fields. 

The paper is organized as follows. In section 2, we briefly recall standard techniques for 
deriving a closed-form approximation of a random variable from a finite set of samples. 
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Section 3 is devoted to a short discussion of different representation formats of functions 
in high-dimensional spaces. The proposed solution method is introduced and discussed in 
section 4 along with a simple motivating example. An algorithm is proposed and numerical 
results are provided, demonstrating its effectiveness. Scalability of the proposed approach 
together with its robustness w.r.t. noise in the data is also discussed. In section 5, the 
present methodology is illustrated on a stochastic diffusion equation involving up to 100 
dimensions and on the space-dependent solution of the Shallow Water Equations with 
random parameters. Accuracy, robustness and scalability of the proposed approach are 
shown. Concluding remarks close the paper in section 6. 



2 Quantification of uncertainty 



We briefly discuss the representation of a random quantity and standard ways of evaluating 
it in closed-form from a discrete set of samples. 



2.1 General framework 

Random quantities are defined on a probability space (Q,Bq, //e) where is the space 
of elementary events 9 G 6, Bq a a-algebra defined on 9 and /ie a probability measure 
on Bq. To make the description of the problem amenable to a tractable representation, 
it is convenient to introduce a finite set of statistically independent random variables 
{£i}f =1 : O — > Hj C JR., 9 i— > £i(6). The set of these d random variables is defined on a 
probability space (S,B 3 ,// H ) with S = xf =1 E { = £(6) C R d , £ := . . . £ d ), B E C 2 s a 
cr-algebra on H and /x= = /^eo^" 1 the probability measure on £> s . Since the physical process 
at hand relies on random quantities belonging to (0, Bq, [Mq), a suitable description of its 
output, or its solution in case the physical process is described by a known mathematical 
model, may be determined in (S, S=,/xs) as justified by the Doob-Dynkin lemma. 

In this work, we restrict ourselves to random variables of physical significance, i.e., real- 
valued second order variables satisfying: 

| [u (9) 2 } := / u (9f d/ie (*)=/« (C) 2 d/i= (C) =: f [« (£) 2 ] < +°°, (1) 

where ^ denotes the expectation operator and u is the quantity of interest (Qol). It is then 
natural to consider the space of square integrable functions S for describing real-valued 
functions of the random quantities: 

S := L 2 (E, n B ) = |u :H->R,£h+i;(£); | [u (^) 2 ] < +oo| . (2) 
Upon introduction of a natural inner product of S: (v, n>) L2 ( H ^ := J v (0 10 (C) d/is (C)> 
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Vv,w G S, and the associated norm \\v \\ L 2r B M „) := ( v , v )l 2 (e S is a, Hilbert space. 
Further, we define (v) L 2r 3 ^ '■— <8 \ [v (£)]. One can now rely on functional analysis results 
and take advantage of approximation theory techniques to characterize the output u. 
Introducing a Hilbertian basis {^fc} fceN of S, the output can then be uniquely represented 
as 

£^«(£). (3) 

a 

The basis {ip a } a€ ^ is typically chosen orthonormal w.r.t. the inner product (v, w) L2 ( E 
The orthonormality of the basis leads to (ip a , ^ a ')i 2 (E ^ = S aa r, Va,a' G N, with 5 the 
Kronecker delta, and the decomposition coefficients {c a } then express as 

C a = (w,^> i2(H)MB) = J^U(C) ^a(C) d/X S (C), V « G N. (4) 

For a given representation basis {ip a } of 5, the output u (£) is entirely characterized by the 
set of coefficients {c Q }. For computational purpose, the infinite dimensional representation 
Eq. (3) is substituted with a finite dimensional approximation relying on a subset J C N 
of the representation basis: 

E^ate). (5) 



Computing a data-driven approximation 

As seen above, in many situations, a closed-form model of the Qol is not available or 
not reliable enough to be used and one can only rely on the sole available input-output 
information to approximate the output u. The solution method then consists in using 
a set of outputs given some inputs, i.e., samples of the process. One then looks for a 
functional form of the map between the set of random variables Q- q ' and the output value 
u f£^J =: v,( q \ VI < q < N q , where N q is the size of the available experimental set. 
Approximating the output under the functional form of Eq. (5) results in evaluating the 
coefficients {c a } from £ (<?) = • • • 

2.2.1 Monte Carlo 

If the sampling can be controlled, in the sense that samples can be drawn arbitrarily, the 
popular Monte Carlo approach can be followed and the approximation coefficients are 
then estimated from 

c a = [ u (o v« (0 d^ s (o~E^ {i {q) ) v>« {i [q) ) ■ (6) 



5 



Monte Carlo-based estimation is very robust and easy to implement but suffers from a slow 
O [N q ~ l ^ 2 ^j asymptotic convergence rate. However, since the convergence rate does not 
depend on the dimensionality of the integral, this is a wise choice for very high-dimensional 
problems where other methods fail. Alternatively, quasi-Monte Carlo methods generate a 
low-discrepancy sequence of samples improving the convergence rate of the evaluation for 
moderate- to high-dimensional problems. 

2.2.2 Quadrature 

For low to moderate dimensionality problems, the d- dimensional integral arising in Eq. (4) 
may be advantageously evaluated with a quadrature rule: 

c a = I u (C) i>* (C) d/i= (C) « E ™ {q) « (* (,) ) 1>« (t (q) ) , (7) 

where {w®} are the weights associated with the quadrature points F° r instance, 

the Gauss quadrature rule uses only N q samples to exactly integrate a polynomial inte- 
grand of degree less or equal to 2 N q — 1 while the Clenshaw-Curtis is nested, allowing to 
sequentially increase the order of the quadrature rule without having to re-evaluate points 
sampled at the previous step. 

2.2.3 Regression 

The above methods require some kind of control over the samples. However, experimental 
data are often obtained with little, if any, control on the samples, hence no experimental 
design may be exploited. A solution method is then to reformulate the evaluation of the 
coefficients minimization problem: 

c = argmin ||it — \1/ c\\ 2 , (8) 

with c = (ci . . . C|^|) T , u = (uW . . . u^y, * G R N « X W, q qa = ip a and \ J\ the 

cardinality of the approximation basis {ip a } a€ j- For a full column rank the solution is 
given by c = \E' + u which is typically evaluated using the Cholesky decomposition of the 
symmetric positive definite matrix \1/ T \1/ or the QR decomposition of When the size of 
the dataset grows, this standard Least Squares (LS) problem may become computationally 
involved. The quasi-regression solution alleviates the computational burden and is given 
by 

C a = i>lul \\^ a f % , = (Va • • • ^ (£ W )) T , 1 < a < \J\. (9) 

Standard least squares formulation as considered in Eq. (8) treats all predictors {^a}!^ 
the same way and uses the available data to estimate all the coefficients to produce 
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an estimate with a low bias but often a large variance. As will be discussed in section 
4.4.1, additional properties of the Qol may be exploited or imposed to the approximation 
coefficients. This class of approaches trades some increase in bias with a decrease in 
variance and often results in an improved accuracy. A suitable solution method then 
typically formulates as a penalized least squares problem: 

c = argmin \\u — ^ c\\ 2 + J? (c). (10) 



The properties of the penalized LS solution are driven by the choice of the function 
which flexibility leads to a variety of solution techniques, see Hesterberg et al. (2008); 
Hastie et al. (2009). Since we have no control over the sampling strategy, we will rely on 
regression to estimate the approximation coefficients. The discussion of an efficient least 
squares formulation in the present context is postponed to section 4. 



3 Functional representation of random variables 

3.1 Polynomial Chaos 



As seen above, a random quantity is conveniently approximated in a Hilbertian basis 
{ipk}- This allows for a closed-form representation and is suitable for subsequent use, say, 
as an input of a numerical model. Many choices may be made for the basis and this brings 
flexibility If the random quantity is known, or expected, to exhibit a certain degree of 
smoothness along the stochastic space, a suitable and popular choice is to take advantage 
of this smoothness using a spectral-based approximation relying on polynomials. Early 
efforts towards this direction are the pioneering works of Wiener (1938) who used univari- 
ate Hermite polynomials ip a (£») of zero-centered, unit variance, normal random variables 
£i ~ J\f (0, 1). These polynomials define an orthogonal basis of L 2 (S i; fiEi), /^-i oc e~^i . 
Tensorization of univariate Hermite polynomials i/j then leads to an orthogonal basis of 
L 2 (5, /is): 

("0Q: 1 *0ct' ) L 2 (H, us) x L^a (0 (0 e"^ d£ oc 5 aa >. (11) 



This can be extended to polynomials orthogonal with respect to different measures, 
Ghanem & Spanos (2003); Xiu & Karniadakis (2002); Soize & Ghanem (2004), and consti- 
tutes the so-called (generalized) Polynomial Chaos basis. A common practice is to consider 
an approximation space S p spanned by polynomials of given maximum total degree p: 



S p = span (^{if) a (£) = ip ai (&) ...ip ad (£ d )} ; a = (a x . . . a d ) , ^ a* < p\ 



(12) 



and the number of terms to be determined in the approximation (5) is then \J~\ 



7 



. We adopt the convention ipi = 1. When the random quantity is not smooth 

V d ) 

enough for a low degree polynomial fit to be accurate, approximation schemes such as 
h/p-type refinement or Mult i- Resolution Analysis may be applied, see Le Maitre & Knio 
(2010). 



3.2 Tensor-based representation 



Some alternative representation formats specifically exploit the tensor-product structure 
of the Hilbert stochastic space S. This class of methods typically approximates a <i-variate 
function with a series of products of lower dimensional functions. Efficient algorithms allow 
to determine the approximation coefficients of the representation by solving a series of 
low-dimensional problems while never considering the full- dimensional problem at once. A 
general presentation of tensor-structured numerical methods can be found in Khoromskij 
(2012) while application to the approximation of a high-dimensional random quantity 
is considered in Khoromskij & Schwab (2011); Nouy (20106); Matthies Sz Zander (2012); 
Doostan & Iaccarino (2009). For instance, a <i-variate quantity may be approximated under 
a CANDECOMP-PARAFAC (CP) format with a sum of rank-1 terms, the simplest form 
of tensored-structure format: 

T> r 

«(£)«E/l,r(£l) ■■■ fd,r(td), (13) 
r=l 

with n r the retained rank of the decomposition and {fi, r }f =1 univariate functions. These 
univariate functions are usually sequentially determined in turn so that only 1-D problems 
have to be solved. Assuming p-th order polynomials for {/i, r }, the resulting cardinality 
of the approximation is dn r p. It thus exhibits a linear dependence with the number 
of dimensions, in contrast with the exponential dependence of the Polynomial Chaos. 
Approximation techniques more efficient than the decomposition (13) rely on numerically 
robust and/or memory efficient data-structures, e.g., Tucker format or Tensor- Trains, see 
Khoromskij (2012). A tensored-structure format then constitutes a method of choice for 
deriving memory- and CPU-efficient approximation of high-dimensional quantities. They 
also lead to a low- cardinality basis \J~\ so that the conditioning of the approximation 
method remains good, in the sense that \J~\ < N q , a crucial feature for deriving a good 
approximation from the scarce available data. 



3.3 High- Dimensional Model Representation 



In a similar perspective of using an approximation basis with the least cardinality possible 
to make the most out of the scarce data, an efficient alternative to the tensored-structure 
formats of the previous paragraph for representing high-dimensional quantities is discussed 
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in Rabitz & Ah§ (1999); Ali§ & Rabitz (2001). It consists in representing a quantity u (£) 
with a sum of lower- dimensional terms accounting for increasing levels of interaction 
between the constitutive variables: 

« (*) =/* + £/< (6) + E /« (6, £,■) + ••• + /i2... d (d, • ■■,&) = E A. ( 14 ) 

i=l M=l, 7C{l,...,d} 

where / 7 are functions of 5 and depend only on a subset of variables £ 7 = {£j} i6 _, and 7 
is a multi-index. This decomposition is exact and does not introduce any approximation. 
It is also unique. An important property is that the modes {/ 7 } are mutually orthogonal: 
(/-y> f-y') L 2 (3 = O5 ^7 7^ 7' — {!>•••> The zero-th order term fa accounts for the 
mean and is invariant across the entire domain S, while the other modes are zero-mean: 

/ = « 2(HiMs) , (A) L2(s , Ms) = 0, V 7 C{l,...,d}\0. (15) 



First order terms {fi} i=1 are responsible for the approximation of u (£) with univariate 
functions of variables acting independently, second-order terms {faj} i - =1 represent the 
joint-impact of two variables & and onto the output w, etc. The rationale behind 
the expected success of this so-called High Dimensional Model Representation (HDMR) 
is that many quantities of interest exhibit a significant dependence on low-dimensional 
groups of variables only, hence having negligible high order interaction decomposition 
terms. This leads to an efficient approximation of u with only a low iVj-order HDMR: 
u (£) ~ E 7 c{i,..,d}/7 (^7)' 1^1 — ^ e denote <?f the se1, OI " retained modes, Jf := 

{ 7 c{i,...;47i7i<^}. 

Functions {/ 7 } are evaluated with the application of a set of commuting projections {Pi} 
onto the output u. The projection Vi eliminates the effect of variable & while leaving 
the effect of the others unchanged. Letting V® be the identity operator on S, we define 
V v = Y\_ Pi, V?} C Functions {/ 7 } can then be written, Kuo et al. (2009), 

/ 7 c{i,...,d}\0 = P{i,..,d}\ 7 w - E /v = E (-l) l7hl7 1 P{i,...,4\ 7 ' w, /0 = ^{i,...,4 w. 

(16) 

Defining projections as 7>i u (£) = Je» u (6l> • • • > C> 6+1, • • • , &) d/i (C), the measure // 
determines the form of the projection. A popular choice consists in using fi = fj,^ so that 
the Analysis of Variance (ANOVA) decomposition is obtained. An example of application 
of the HDMR representation to the approximation of a random quantity is presented in 
Ma & Zabaras (2010). 
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4 Quantifying the uncertainty of experimental data 

4-1 Setting up the stage 

In the following, we will consider that the quantity of interest u is a scalar-valued random 
field, indexed by space and/or time x £ and depending on a set of random variables 
£ £ ]R rf . To approximate it, the only available piece of information is the collection of 

samples \x( q \g q \uM\ 9 . Since the underlying random quantity is only known through 

l > q=l 

these samples, no governing equation for the Qol can be exploited and, say, Galerkin 
projection-based weak-formulation methods cannot be employed. Further, these samples 
are collected at random and do not follow a deterministic rule so that no deterministic 
sampling strategy can be assumed. Quadrature-based techniques can then not be applied 
either and one has to resort to regression to estimate the coefficients of the approximation 
in the retained basis {ip a }- Standard L 2 -regression solves Eq. (8) which is only well-posed 
for a matrix \1/ such that \I/ T \& is invertible so that it requires the number of observations 
to be larger than the cardinality of the approximation basis, N g > \J\. Since the samples 
are believed to lie in a high-dimensional space selecting an approximation basis which 
both leads to a well-posed regression problem and a good representation accuracy could 
be a daunting task when observational data is limited. 

The choice of a good approximation basis in a general setting largely remains an open 
question. On one hand, if one is given a dictionary of approximation functions, a priori 
selecting the best terms so that they can be evaluated from the data is generally a NP- 
hard problem. On the other hand, dictionary-learning techniques require a training while 
availability of an independent training set cannot be assumed here. 

We first consider the approximation strategy for a random process in the form of a sepa- 
rated representation u (x,£) ~ J2n w n ( x ) A n (£), section 4.2. Modes {w n (a?)} are associ- 
ated with all physical dimensions the random process may be indexed upon (space, time, 
...) so that x = (xi 22 • • • t . . .) C M. da> . Examples considered in this paper are space- 
dependent random variables so that x = (x\ 22 • • •£<£,.) and {w n (x)} will be referred to 
as 'spatial modes'. In contrast to x, the dimension of £ can be very large and a specific 
strategy is then discussed for representing the 'stochastic' modes {X n (£)}, sections 4.3-4.5. 

In the present work, we separate the determination of an efficient representation format 
from the evaluation of the coefficients. We first choose an a priori, or proposal, format for 
the approximation, section 4.3, and leave the selection of particular terms to be included 
in the retained approximation basis to a dedicated subset selection procedure which will 
further refine the approximation basis, section 4.4. Results from Compressed Sensing show 
that the number of samples necessary for accurately selecting the dominant basis functions 
of a .fT-sparse Qol varies as K log(|j7"|), Candes & Romberg (2006), illustrating the fact 
that it becomes increasingly difficult to select the best terms when the size \J\ of the 
proposal dictionary increases. Hence, for the subset selection step to be efficient in terms 
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of required amount of data, the retained a priori format has to be tight, in the sense of 
leading to a low approximation error with a low number of degrees of freedom (terms): 
it should be chosen as tight as possible based on knowledge one may have on the Qol at 
hand. As will be seen below, the computational cost associated with the subset selection 
scales with the cardinality of the proposal dictionary. The subset selection hence produces 
a basis, or skeleton, suitable for the data at hand. However, it requires a basis linear in its 
predictors. For this reason, once a set of functions found contributing significantly to the 
approximation is determined, the actual approximation may be evaluated with a different 
basis, of the same skeleton, possibly nonlinear in its predictors such as a tensor-based 
approximation. Evaluation of the final approximation is discussed in section 4.5. 



4-2 Approximation of a random process 



The approximation of a random process, say, a space-dependent uncertain quantity u (x, £) 
is considered. An approximation is determined in the form of separation of variables: 

N 

u (aj,£) « w (x) + Y w n (x) X n (£). (17) 

n=l 

For convenience, the decomposition is rewritten as u(x,£) ~ J2n=o w n( x ) A re (£), with 
A = 1. The spatial modes are defined as: w n (x) = YJu=i ( x ) with {</>;} a chosen 

truncated basis of cardinality \J~ X \. The functional form of stochastic modes {X n } will be 
discussed in section 4.3. 

The spatial and stochastic modes of the approximation (17) are sequentially determined in 
turn. Let \\v\\ N := (v, v) N ^ be the norm induced by the data-driven inner product: (•, -) N : 

R x R — > R, (v, w) H- (v,w) N := Z) g =i v ^ w^ q \ Assuming {A n } known and projecting 
Eq. (17) onto the space spanned by {</>;}, the coefficients |c^| of the deterministic mode 
w n are the solution of the following problem: 



1 n-1 



(u, <f) k X n ) 



N a 



Y Wn' X n ' + C< Cn & X n ,(j) k X 



VI < k < \ J X \ 



\n'=0 



1=1 



C 



arg mm 



n-1 



U 



- Y W n'& K' - (* C) © A n 



n'=0 



where $ e R"«*W $ gi = 0, (x®),w n = (w n (x^) . . . w n (i™)) , A„ = (A n . . . A n 

and is the Hadamard product. Similarly, the stochastic mode A n is evaluated by deter- 
mining the set of coefficients |c^ A ^| minimizing u — J2n~=o w n' © \i> ~ w n © \i ({ c i A ^}) 
using Algorithm 2 to be presented in section 4.5 The spatial mode w n is then evaluatec 
from Eq. (18) given all the other information and the whole iteration is repeated until 
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convergence of the pair {w n , A n }. The next pair can then be determined with the same 
methodology with n <— n + 1. The algorithm is summarized in Algorithm 1. 

Algorithm 1 Sketch of the solution method for approximating a random process 
1: Choose \J~ X \. Initialize z = u and Ao = 1 and set n <— 0. 
2: repeat 

3: while HAnHjy not converged do 

1 3 I 

4: Solve for coefficients {c[^| " of the deterministic mode given A„ and z 

using Eq. (18) and normalize w n (x) = J2i 4>i ( x )i ll^nlljv = 1- 
5: Solve for the stochastic mode A„ (£) using Algorithm 2 given w n and z. If 

n — 0, A <- 1. 
6: end while 

7: Set z <— z — w n © A n , and n <— n + 1. 

8: until a termination criterion is met (for instance, || || ^ below a threshold or maxi- 
mum rank n > N reached). 



Remark 1 // the separated approximation grows beyond a few modes, it is beneficial 
to update the coefficients of, say, the spatial modes for improved accuracy: solve for 
{w , w n } given {u, A , . . . , A n }. 

4-3 A basis suitable for scarce data regression of a random variable 

Since the solution method discussed below is not specific to as separated format u (x, £) ~ 
J2 n w n («c) A n (£) but applies to random variables in general, the Qol u referred to in 
the subsequent sections is any stochastic mode A n (£). The discussion then focuses on 
approximating a high-dimensional random variable u (£). 

In this work, we want to take advantage of the low order interactions of constitutive 
variables for many quantities of practical interest as mentioned in section 3.3. Previous 
works have shown evidence of this low interaction configuration in various situations, 
Rabitz & Ah§ (1999); Ah§ & Rabitz (2001); Ma & Zabaras (2010), and the Qol is hence 
chosen to be approximated under the HDMR form, Eq. (14). 

To assess the choice of our a priori functional form for approximating a random vari- 
able, and while choosing a good basis is problem- dependent, let us consider a simple 
motivating example in the form of a steady-state stochastic diffusion equation on Q x H, 
Q = [x-,x+] C K, with deterministic Dirichlet boundary conditions: 

V, (v(x,£) V x u(x,t))=F(x,e'), =u-, u(x + ,£) = u+. (19) 

The right-hand side F is a random source field and v is a space-dependent random diffusion 
coefficient modeled as: 
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v (x, £') = i/ (z) + J] y/°^kUv,k (z) £' = (fi • • • ^ 

fe=l 



f (x, i") = F (x) + £ (x) & r = • • • a) > (2°) 



fc=i 



with z/ (x) = 1 and F (x) = —1 the respective mean values. The random variables 
• • • ? £d„> £i> • • • > ^dp} are chosen mutually independent and uniformly distributed on 
[0,1]. The spatial modes u Ut k (x) and uf,u (x), and their associated amplitude ^Ja v ^ and 
y/<JF,k, are the first dominant eigenfunctions of the following eigenproblems: 



(x-x'f 

K u (x, x') u U)h (x) dx' = a U)k u) U)h (x) , K v (x, x') = a 2 v e 2L ^ , 

Ix — x \ 

K F (x, x') u Ftk (x') dx' = a Fjk u F:k (x) , K F (x, x') = ape 2L i F , (21) 

with F^ and F^ the correlation kernels. The random fields properties are chosen as 
a v = 0.7, a F = 0.7, L CjV = 0.3, L F ,„ = 0.3. Note that i/ (-,£') > a.e., and F (•,£") < 
a.e., so that the problem remains coercive. The spectra of the operators associated with 
these eigenproblems are here the same and decay quickly thanks to the high correlation 
length as can be appreciated from Table 1 where the dominant eigenvalues are given. The 
resulting problem is then anisotropic in S in the sense that the degree of dependence of 
the input random parameters along the dimensions . . . strongly varies. 

01 0- 2 C3 04 0"5 0"6 °7 0"8 

0.1815 0.1396 0.0906 0.0450 0.0236 0.0097 0.0035 0.0011 

Table 1 

Upper part of the spectrum of eigenproblems (21). 

Denoting £ = G M d , d = <i„ + d F , the solution w is approximated in a rank-iV 

separated form: u (x, £) ~ u (x, £) = wq (x) + Z^Li w n {%) A n (£). Each stochastic mode 
A n (£) is determined either in a CP-like format, Eq. (13), or as a HDMR decomposition 
Eqs. (14). In the latter case, interaction modes {/ 7 } are approximated with a low-rank 
canonical decomposition on tensorized, unit-normed, univariate polynomials of maximum 
degree p: / 7 ({&} iey ) ~ E"=i Ili 67 Ea= 2 c 7 *« (6)- This approximation is hereafter 

referred to as a CP-HDMR decomposition. Similarly, univariate functions {fi, r }^ =1 in- 
volved in the CP decomposition Eq. (13) are approximated with the same polynomials: 

fi,r ~ X/a=l Ca,i,r (Ci)- 

The representation of the stochastic modes here relies on p = 8-th order univariate Leg- 
endre polynomials {ip a }- The dimension of the problem is chosen to be d v = d F = 5 so 
that d = 10. The approximation accuracy is estimated by the relative L 2 -norm e of the 

approximation error estimation evaluated from a A^-point test set j ^x®, £® , it®) j 
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independent from the training set: 



\u 



u\ 



u 



u 



(1) 



u 



u 



u 



(1) 



It 



(22) 



The CP-HDMR expansion is here built sequentially, starting with 0-th and 1-st order 
interaction modes only. From this first approximation of the output, the set of dominant 
dimensions is estimated from the L 2 -norm of univariate interaction modes {/ 7 }| 7 | =r Only 
second order interaction modes {/ 7 }| 7 | =2 in these dominant dimensions are next estimated 
and the set of dominant dimensions is then further refined based on both 1-st and 2-nd 
order interaction modes via the sensitivity Sobol indices, see section 4.8. Third order modes 
are then computed for this new set of dominant dimensions only and the procedure is 
repeated until some stopping criterion is met, for instance a maximum interaction order Ni 
or a maximum basis cardinality \J\. The number of samples N q is here chosen sufficiently 
large so that full knowledge on u can be assumed. The approximation error then only 
comes from the choice of the approximation basis format, allowing a comparison. This 
section is loose on details, focusing on the main conclusions and leaving more in-depth 
discussion for subsequent sections. 

First, the accuracy of the CP-HDMR approximation as a function of the decomposition 
rank N is studied in terms of e, Eq. (22). Plotted in Fig. 1, the approximation error 
estimation e decreases when the maximum interaction order iVj increases from 1 to 3 and 
as the decomposition rank N increases. 

0.01 



0.001 r 



0.0001 



1e-05 



1e-06 




Fig. 1. Convergence of the error estimation of the approximation with the order iV of the sepa- 
rated representation and the maximum interaction order 2Vj of the HDMR expansion. 



The approximation is seen to improve exponentially fast as the number of modes N in the 



11 



separated representation increases until it reaches a plateau at the best approximation one 
can get with the retained interaction order iVj. Increasing the interaction order leads to 
an improved approximation: increasing from first to second order brings more than a one- 
order of magnitude improvement in the approximation error estimation and an additional 
order of magnitude from JVj = 2 to Ni = 3. In this d = 10 example, the approximation 
hence exhibits a high convergence rate with iVj, supporting our assumption that low-order 
interactions dominate the HDMR decomposition. 

This CP-HDMR approximation of the stochastic modes is now compared with a CP- 
like approximation in the form of Eq. (13), derived with an algorithm similar to that 
in Nouy (20106). Both decompositions rely on the same approximation basis for the 
deterministic modes {w n }. We focus on the accuracy of the reconstruction as a function 
of the cardinality of the whole approximation basis both for d = 10 and d = 40 when 
the maximum decomposition rank N varies, see Fig. 2. The total cardinality increases as 
more terms are considered in the decomposition series. 



0.01 



0.001 r 



« 0.0001 r 



1e-05 r 



1e-06 <— 
100 




100 1000 10000 100000 1e+06 1000 10000 100000 1e+06 



Fig. 2. Convergence of the approximation error estimation e with the total cardinality \J\ of the 
representation basis. Approximations of the stochastic modes with the CP-HDMR and CP-like 
format are compared, d = 10 (left) and d = 40 (right). 

The accuracy of the representation is seen to improve as more terms are considered in 
the series expansion and both the CP and the CP-HDMR formats exhibit an exponential 
convergence with the total size of the decomposition. The CP decomposition achieves 
a 10~ 4 relative error with a mere 20,000 coefficients (d = 10) and with about 100,000 for 
d = 40. The first order CP-HDMR quickly reduces the error but plateaus as the functional 
space is small. A second order approximation already allows a similar 10 -4 relative error 
with about 10,000 unknowns for d = 10 and 100,000 for d = 40. The third order CP- 
HDMR decomposition is more costly in number of coefficients to determine to reach a 
given error and, in the present settings, should only be considered if high accuracy is 
needed. 

Unless the targeted accuracy is really high, this motivating example tends to indicate that, 
for a reasonable required accuracy, a CP-HDMR format involves fewer unknowns than a 
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CP-like decomposition, both for a low d = 10- and a moderate d = 40-dimensional prob- 
lem. This is an important point since the number of coefficients which can be evaluated 
with a reasonable accuracy from experimental data is directly related to the size of the 
available dataset. Moreover, the Sobol indices-based threshold used to incorporate higher 
interaction order terms was only coarsely hand-tuned leaving significant room for further 
improved CP-HDMR performance. Finally, a CP-HDMR format allows a great flexibility 
in representing interaction modes {/ 7 }. In particular, a more parsimonious representation 
will be used in the sequel and will achieve a similar accuracy with a lower number of 
terms. This motivating example hence demonstrates that a general CP-HDMR format 
approximation may compare favorably with a tensor-based approximation in terms of re- 
quired number of basis functions for a given reconstruction accuracy, even for reasonably 
large dimensional problems, and motivates our choice of an HDMR format for the a priori 
basis. 



4-4 Subset selection 

We now build up from the a priori basis and further improve it with an a posteriori, 
data-driven, procedure. 

4-4-1 A direct approach 

As discussed in section 2.2, different techniques may be used to compute the coefficients 
of an approximation. In the case considered in this paper, they are evaluated with a 
data-driven approach from the N q pieces of information available on the Qol. However, 
the available data is scarce while the cardinality | priori of the proposal approximation 
basis may be large, in particular when the dimensionality d of the problem is large. It 
can then result in an ill-posed problem where one has to estimate | jTprior I coefficients for 
each stochastic mode A n from N q <C | J W i or \ pieces of information. However, this situation 
often only reflects our lack of knowledge on the quantity at hand and how conservative 
this naive approximation method is. Indeed, the "blessing of the dimensionality" yields 
that high-dimensional problems are often intrinsically sparse and lower dimensional. In 
the present setting, it is likely that many dimensions actually hardly contribute to the 
approximation and that representing the dependence of the Qol along only a subset of 
the dimensions yields an acceptable accuracy. In our HDMR representation, it means 
that many interaction modes {/ 7 } can be discarded without significantly affecting the 
accuracy. The challenge for an efficient solution method is then to reveal and exploit the 
low-dimensional manifold onto which a good approximation of the solution lies. This leads 
to a sparse representation where most of the coefficients are negligible w.r.t. the dominant 
ones, hence yielding a well-posed problem, | priori < N q . As an illustration, if u (£) = g (&) 
was depending only on one dimension i, i e {1, . . . , d}, information theory allows to show 
that one only requires m + 1 + |~log 2 of] function evaluations to approximate a sufficiently 
smooth function g G C s , having s continuous derivatives, so that ||it — w|| C / 3 .s < ah s , 
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h := 1/m, where a > is related to a norm of g, DeVore et al. (2011). This number of 
samples actually is directly related to the number of information bits required to represent 
the integer i e [l,d]. 

While determining which coefficients are dominant is an NP-hard problem in general, 
recent results have shown that a good estimation of the best subset can be obtained as 
the solution of a convex optimization problem. In particular, the LASSO formulation has 
been extensively studied and proved effective, Tibshirani (1996). One formulation of the 
LASSO, referred to as Basis Pursuit Denoising, writes: 

c = argmin || c|| x s.t. \\u — ^> c|| 2 < e, (23) 

with \1/ the matrix of evaluations of the approximation basis and e the approximation 
residual. Efforts from the signal processing community, where the theory supporting these 
results is termed Compressed Sensing, have demonstrated its good recovery properties in 
the case where N q < |j7p r i r|, e.g., Chen et al. (1999); Candes & Tao (20046); Donoho 
(2006). In particular, this formulation achieves provable and robust recovery bounds: for 
a sufficiently incoherent set of approximation and test functions, a i^-sparse solution c to 
Eq. (23) satisfies, Cai et al. (2010), 

\\c*-c\\ 2 <hL+ l ^^Y (24) 



where h > is a constant depending on the set of approximation and test functions and 
c* K is the X-term approximation of c* given by an oracle, i.e., it is the best i^-term 
approximation of c* if one was given full knowledge of it. 

The Compressed Sensing technique was proved very effective and is now being applied 
in many areas, including Uncertainty Quantification, Doostan & Owhadi (2011); Math- 
elin & Gallivan (2012). However, standard implementations of the algorithm require the 
sensing matrix ^ to be available. This bears an intrinsic limitation when it comes to high- 
dimensional problems as it requires the use of the whole dictionary at once from which 
to select the basis functions associated with the dominant coefficients. This can then be 
seen as a top-to-bottom approach where one starts with a whole set of basis functions 
and shrinks it so that the best subset remains, in the sense of Eq. (23). While effective, 
this approach is not tractable for high- dimensional problems, neither in terms of storage 
requirement nor CPU burden. 



4-4-2 A progressive selection 

To circumvent the issues identified above, we here use a bottom-to-top approach which 
achieves a forward stagewise regression by progressively revealing important basis func- 
tions. Introduced by Efron et al. (2004); Hastie et al. (2009), the Least Angle Regression 
Selection (LARS) technique relies on analytical solutions to speed-up computations and 
essentially follows the piecewise linear regularization path of the LASSO. In a nutshell, it 
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consists in selecting, from the a priori set j7p r ior, the predictor (approximation function) 
which is most correlated with the current residual, move this predictor to the active set 
impost) compute the increment solution vector by minimizing the residual L 2 -norm and 
follow the descent direction along the increment vector until a predictor from the inactive 
set becomes as correlated with the residual as those from the active set. The whole pro- 
cess is then repeated and allows to sequentially build the optimal subset of approximation 
functions by exploring the Pareto front defined by the competition between the two terms 
of the unconstrained formulation of the optimization problem of Eq. (23). One advantage 
of LARS over other techniques is that the whole potential dictionary is never stored nor 
used. A LARS approach in the UQ framework was also considered in Blatman & Sudret 
(2011). 

We consider the following polynomial approximation / 7 of / 7 : 

fi ({&}iey) ~ fi (fc)ie 7 ) := E ^-({fchey), V>« = IIlMfc), ( 2 5) 

a,\a\<p *£7 

with a. = (oti,i G 7), a.; G {1, . . . ,p}. Interaction modes {/ 7 } are then approximated in 
P~, the space of polynomials with maximum total degree p, by modes {/yj linear in their 
coefficients. 

In the present framework, the HDMR approximation format naturally leads to groups of 
predictors which importance in describing the Qol u follows a similar trend. These groups 
are defined by the subsets {j7" 7 } of predictors which belong to a given interaction mode 
ff, J~i = \i>cL ({6i}ie 7 )}> an d are likely to be strongly correlated. For instance, if the 
Qol exhibits a strong dependence on a given dimension £j, one then wants to incorporate 
the whole set of predictors ({£i} ie7 )}, 7 : j G 7 without evaluating their relevance 
individually. One then looks for an approximation which is sparse at the level of groups 
of functions. Note that grouping predictors significantly alleviates the computational cost 
associated with the subset selection as further discussed in section 4.7. 

It is important to stress that this approximation format is made only for the subset 
selection step and is independent of the format the Qol is finally approximated in. The 
selection of groups reduces to selection of interaction modes / 7 and leaves the possibility 
for using different formats between the subset selection and the coefficients evaluation 
steps: an interaction mode found to be dominant is incorporated to the active dictionary 
i7/,postj independently of the way its contribution to the approximation of u is actually 
determined in the end. Indeed, since the LARS technique only applies to predictors linear 
in their coefficients, an approximation f 1 of the form (25) is suitable for the selection of 
the dominant groups. However, the final approximation f 1 of the retained / 7 may rely on 
predictors nonlinear in their coefficients: the subset selection step only serves to determine 
which interaction modes will be considered in the a posteriori approximation basis, the 
'skeleton' {/ 7 : 7 G J/, P ost}- 

The selection is made using a modified LARS approach and the following optimization 
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problem is solved: 

c = argmin ||it — \& c\\ 2 2 + r V] || c-y || ^ , (26) 

with r > the regularization parameter and ||-||^ a norm induced by a positive definite 
matrix K 7 . All predictors within a group 7 are here weighted similarly so that we use a 
scaled identity matrix K 1 = hj„\ /\<7~y\, V7 G Jf. The regularization term is a combina- 
tion of L 2 - and L 1 - norms and penalizes the L 1 -norm of the 'group' vector to promote a 
collective behavior: either a group is basically active (non-zero i^ 7 -norm) or inactive, es- 
sentially disregarding the detailed behavior within the group. This group LARS (gLARS) 
strategy was first proposed in Yuan & Lin (2006) and the algorithm presented in Xie & 
Zeng (2010) was modified to solve the optimization problem (26). 



4-5 Approximating a random variable 



We now discuss the general methodology for approximating a random variable u (£) from 
a finite set of its realizations. An a priori choice of a representation format was first made 
based on knowledge one may have on the expected form of the Qol, section 4.3. This finite- 
dimensional space may still be large w.r.t. the amount of available information and was 
adjusted based on the data through the subset selection procedure, the a posteriori step, 
section 4.4. This has selected a set of groups, or interaction modes, {f-y} ie j f t deemed 
to contribute most to the HDMR representation of the Qol. The actual approximation of 
u will rely on these selected groups but does not bear restriction on the linearity w.r.t. the 
coefficients so that different suitable formats, possibly nonlinear, can then be considered, 
possibly leading to fewer number of coefficients than at the a posteriori step, to further 
reduce the number of coefficients to estimate. 

One possibility is to determine an approximation of {/ 7 ,7 G Jf^ost) m the space P p of 
polynomials with maximum total degree p along each dimensions as considered in the last 
section: 



fi ("teheyj » ft {{tihe-r) = Yl c i^^ a ({&} i67 J, ^ ({&} iey ) = II (&)> 

a,|ct|<p i(zf 

a=(a ii iey), ct% G {!,...,£>}, 1 < | 7 | < JV/ PC) < min (N h p) . (27) 



The cardinality associated with the approximation of / 7 at a given iteration level I = I7I is 
\Jy\ — p\/ (/! (p — 1)1) and usually provides an accurate approximation with a low number 
of coefficients for low dimensions |— y| . 

(PC) 

When the dimension | — y | increases beyond a prescribed threshold , the number of 

terms in / 7 decreases and eventually degenerates for | — y | > p. For these higher interaction 
order modes, a low-rank canonical decomposition of the interaction modes is instead 
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considered: 

n r p 

fi Ohey) « A ({6} i67 ) = E II E (6), < l7l < W < d. (28) 



The maximum number of modes at a given interaction level I is d!/ {{d — l) \ Relying 

(PC) 

on an approximation in F p for interaction modes of order |7| < iV ; and on low-rank 
approximation for higher interaction order modes, with maximum rank n r , the total car- 
dinality of this approximation format is bounded from above by 

N ' PC) d\p\ Nl d\ 

lJprioA = § (d-iy. (iff (p-o! + w=m nJp - (29) 

We denote i7/, e ff the set of modes \ / 7 \ finally considered for the approximation 

of u and J e s the set of associated predictors The interaction modes are estimated 

sequentially. Once a new mode is evaluated, the whole approximation may be updated by 
reevaluating the coefficients of the predictors already evaluated of the current evaluation 

set {Jcs}- Let z = (z^ . . . z^ Nq ^ be the residual vector after basis functions / 7 , 7 G Jf^s 

have been evaluated. The coefficients involved in the next mode / 7 ,7 G J7/, P ost\i7f,cff to 

(PC) 

be evaluated are then determined. If 7 is such that | — y | < TVy , they are computed from 
the following system of equations 1 : 

j c 7i . =argmin||2-^c|| 2 , Vi G 7, 7 C J), post , | 7 | < Ar/ PC) , (30) 

1 1 iT-y I 



with c 7) . = (c 7 a -, f G 7) T and 



z (0) = u {9) 



£ /r({e!"}, 61 ,), . = (*">...*<*>)'- 



^o = ^({d 9) } i£7 ), (31) 

To solve for the coefficients associated with predictors nonlinear in their coefficients, an 
Alternate Least Squares (ALS) approach is used, reformulating the nonlinear problem 
into a set of coupled linear equations: 

{ c£ = argmin \\ Zi - *c|| 2 , V2 G 7 C ^ />poBt , ivf c ) < | 7 | < JVj, (32) 



1 While not found necessary here, the solution of the least squares problem may be regularized 
by adding a generic term of the form (3 \\Lc\\ 2 . A typical choice is L = Iij | but one may also 
want to consider non-diagonal matrices L. 
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with cl£, = (& % tl . . . c r J p ) and 



~f'QJf,cff r>=li'£y a'=l 



r-1 



*,„=</>„ (d a) ) n *=[*»=]• (33) 

i'E'y,i'^i a!=X 

This whole step is embedded in a loop over the modes / 7 , 7 £ J /, post retained by the subset 
selection procedure. The cross-validation error (CVe) is estimated from N q validation 

samples {£®,u®}„ ff independent from the N q samples of the training test. If the cross- 
validation error has increased over the last two loops, the approximation basis is likely to 
have become too large w.r.t. the available data and basis building iterations are stopped. 
The retained basis is then the one that has led to the lowest CVe. On the other hand, 
if CVe keeps decreasing, the next interaction mode as selected by the subset selection 
step is considered and added to the current active set i7f,eff and the whole iteration is 
carried-out. Once the approximation is determined, the coefficients are updated with the 
same sequential technique using both the training and the validation points, N q + N q . 
The final approximation error e is then estimated with the N q test samples, Eq. (22). The 
global methodology is summarized in Algorithm 2. 



4-6 Robust estimation 



An important concern when deriving a methodology is the robustness w.r.t. noise, in par- 
ticular when dealing with experimental data as in the present work. We hence investigate 
this point and propose a more robust alternative to the methodology discussed so far. 

To evaluate the approximation coefficients once an approximation basis is determined 
from the subset selection step, a standard approach is to minimize a norm between tar- 
get observations and reconstructed approximation as done in the previous section, Eqs. 
(30) and (32): the approximation coefficients of a given mode / 7 are basically given by 
c 7 = argmin~ gR |j7 T | \\z — ^ c|| 2 , with \I> £ M NqX ^ J ^ the matrix of the 7-group predictors 
evaluated in and z the target residual vector. The solution to this least squares 

problem is equivalently obtained from 

{c, Az} = argmin Az s.t. z + Az = f c, (34) 

which minimizes the Frobenius norm of the residual vector. This implicitly assumes 
no error in the coordinates & t which the target is evaluated. In practice, these 

coordinates are often not directly measurable and are inferred from observations via 
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Algorithm 2 Sketch of the solution method for approximating a random variable u (£) 

(PC) 

1: Select an a priori basis in HDMR format. Choose p, iVj, iVy , n r and p. Initialize 
z= (uW...u^ T 



Subset selection step. Solve the LASSO optimization problem with the gLARS 
algorithm — > sequence of a posteriori approximation bases indexed by s with ordered 
groups Jf, P ost = {7 (s) }- Initialize s and J/ jeff : s <- 0, J fiCS <- 0. 
Solve the approximation problem: 
repeat 
s <- s + 1. 

Consider the next mode /_,<» from the set Jf >post selected in (2): i7/, e ff ^— Jf,c& U7 '• 
Solve for the approximation coefficients {c 7 }^ e ^ ff by alternately solving for the 

coefficients of modes \f-y\ , Eqs. (30, 32). [Update step] 
Estimate the cross-validation error CVe and evaluate the current approximation 
u=(u^...u^ T 



9: Update the residual z <c— u — u. 

10: until CVe has increased over the last two passes s and s — 1. 

11: Jf,eS <~ J/,eff\{7 (s) ,7 (S_1) }- 

12: Update the coefficients {c 7 }^ g ^ ft of the retained modes with the extended set of 
data l^ q \u^\ 9 " . It finally yields u(£) expressed in the basis {/-y} 

auxiliary inverse problems. This brings errors so that the actual coordinates vector is 
only estimated with an error A^ q \ Since ^ depends on £, an error predictor matrix 
A^f (£, A£) := ^ (£ + A£) — ^ (£) arises and the estimation problem (34) then rewrites 

{c, A\I/, Az} = arg min I A* Az\\ s.t. z + Az = ($ + A*) c. (35) 



The realizations of the error in the data {A£ (9) (6),Az® (9)} are modeled to follow the 
distribution of zero-mean iid variables. Further, predictors may be correlated: 



\M, a - I [Aip a ]j ( Ail) a , - | [AiP c 



7^0, 



(36) 



with Aip a = Aip a ($} q \A$} q ^y A general approach to solve the weighted Total Least 
Squares (wTLS) problem of Eq. (35) consists in the minimization of the usual weighted 
residual sum of squares p 2 , Markovsky & van Huffel (2007): 



P 



vec (AA) T A" 1 vec (AA) , AA := (A* Az 



A := (Vz) 



(37) 



where the 'vec' operator unfolds a generic mxn matrix into a mn vector and X is the data 
matrix. The covariance matrix for X, A := /vec (X - (X) N ) vee (x - (X) N Y\ 



No 



evaluated and the minimization problem (35) is solved using the ALS-based algorithm 
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proposed in Wentzell et al. (1997). 



As will be shown in the numerical experiments examples, section 5.1.4, the present total 
least squares formulation allows to improve the approximation quality from noisy data. 
However, the solution process is significantly slower than that using the standard least 
squares. The present paper is based on the assumption that the critical part of the whole 
solution chain of determining a good approximation of the Qol is the data acquisition and 
that the cost of the post-processing part is not the main issue. However, if the computa- 
tional cost is a strong concern, the robust approach should only be employed when noise in 
the data cannot be neglected with regard to the targeted accuracy of the approximation. 

Remark 2 When a large amount N q of experimental information is available, the data 
matrix X E M.^^ +1 ^ xNq can be large. The resulting correlation matrix A then has poten- 
tially very large dimensions. However, since the noise is assumed independent from one 
sample to another, A has a block diagonal structure. Further, it is a symmetric definite 
positive matrix, allowing for additional reduction of the storage requirement. The struc- 
ture of A is then exploited in solving the weighted total least squares problem above through 
sparse storage and operations. 

4-7 Numerical complexity 

While the primary motivation for this work is to determine an accurate representation of a 
random quantity from a small set of its realizations, it is desirable that the solution method 
remains computationally tractable. As seen above, the algorithm for approximating a 
random variable is essentially two fold. 

The selection process essentially consists in sequentially building a subset, section 4.4. 
Each step of the sequence involves solving a least squares problem of growing size and 
finding the basis function, or group of functions, within the a priori set iTprior most corre- 
lated with the current residual. The matrix of the least squares problem is ^ G l^ 9 *'^ 1308 *', 
with Impost I the cardinality of the current set of selected basis functions. The least squares 
problem is solved via a QR decomposition of ^ in O (^N q |j7p OSt | 2 ) operations. The iterative 
selection process is carried-out with a growing active set j7p OS t until the problem becomes 
ill-posed, i.e., until |j7p st| is about N q . We use grouped LARS and denote | impost | the 
average cardinality of the retained group predictors, i.e., the average number of basis 
functions in the group added to the active set. The subset selection process retains rif 
groups of variables so that the total cost associated with the least squares step of the 
subset selection is 




As groups of predictors are moved to the active set, the size of the remaining a priori set 
decreases, | J7p r i r | ^ currcnt ^ ~ |Jp rior | — s Impost |- The cost associated with the evaluation of 
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the correlation for each predictor in the inactive set is then: 

/ n f 



correl 



O \J2 N 1 (l^Pnorl " S Impost |) OC N q . (39) 



= 1 



In practice, the cost associated with the evaluation of the correlation of the predictors in 
the inactive set with the current residual dominates so that the whole cost of the subset 
selection finally approximates as 

n f (n f + l)' 



^^subsel fi^LS ~f~ a^correl — ^ I Ag |c7prior| Hf A<j |>^7-y,post| „ I • 



The second step of the solution method deals with the evaluation of the approxima- 
tion coefficients, section 4.5. The cost associated with evaluating the coefficients of a 

(PC) 

l-th interaction order mode, 1 < I < N( , encompasses the matrix ^ assembly cost 
0(N q p\/(l\ (p — /)!)) and the least squares solution O (iVq (p!/(i! (p — /)!)) 2 ) . Since modes 



{/-}. 



already evaluated may be updated once an additional one from the selected 

7&Jf,eB 

set is considered, the total cost is the sum of an arithmetic sequence. Its exact formulation 
depends on the selected set and is difficult to derive in closed-form. As a simple example, 
updating all coefficients for each new mode / 7 considered, neglecting the cost associated 
with first-order interaction modes and assuming only second-order interaction modes are 
retained in the a posteriori set, an upper bound for the cost writes 

\Jf,eS\ 

^cocf< E [0(sN q p l )+0(sN q p 21 )], with/ = 2, (41) 

s=l 

where |i7/, e fr| is the number of groups finally retained for the approximation by the CV 
test, see Algorithm 2. A quantitative discussion of the numerical cost is given in section 
5.1.5 with an illustrative example. 



4-8 Statistics and sensitivity analysis 



Once an approximation of the random variable u is obtained, it is easy to estimate its 
first statistical moments. The methodology described in sections 4.3-4.5 can be applied 
'as is', without a connection with approximating a random process. It is then useful to 
discuss how to determine basic statistics of the approximated random variable. From the 
HDMR format properties, the estimated mean is simply given by the first term of the 
decomposition: (u) L2 f S M= ) — h- 

Thanks to the orthogonality property of the modes {/ 7 }, the variance Var (u) := ^ (u — (u) L2 
approximates as the sum of the variance of the individual interaction modes: 
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Var (u) ~ ^ Var (f. 



7ej>,eff\0 



7 ; ' 



= E E 4,a II^H^(H, MH )+ E E 11^*11^(3 

7eJ>,eff\0 «,|a|<p 76^>,eff\0 r,r'=l 167 «=1 

l7|<AT i (PC) W, (PC) <lTl<JVi 

where use was made of the orthogonality of the Hilbertian basis {ip a }- 

Other standard statistical quantities are the sensitivity indices {5* 7 } := Var (^f^ / Var (u) 
which essentially represent the relative part of the variance of the Qol due to the interac- 
tion of a given set of input random variables only, Sobol (1993); Homma Sz Saltern (1996). 
From Eq. (42), it immediately follows that E T c{i,...,d}\0 S y = 1 and the explicit expression 
of the sensitivity indices is straightforward to derive from the HDMR format. In practice, 
it is often more useful to assess the influence of a given input onto the variance of the Qol 
with the total sensitivity indices {5r,i} =1 : 



E 7 C{l,...,d}\0:ieT Var (A, 

S T ,i ■= 77 7^ 1 < 1 < d. (43) 

Var (u) 



Again, using Eq. (42), this quantity is straightforward to estimate once the approximation 
of u (£) is available. 



5 Numerical experiments 



The methodology developed in the previous sections is now demonstrated on a set of ex- 
amples. Different aspects of the global solution method are illustrated on a 1-D stochastic 
diffusion equation. A more computationally involved example is next considered with a 
Shallow Water problem with multiple sources of uncertainty. 



5.1 Approximating a random variable 



We consider the 1-D stochastic diffusion equation presented in section 4.3, briefly recalled 
here for sake of convenience: 

V x {y{x,£)V x u{x,€)) = F{x,£), u (x_, £) = u {x+, £) = u+. (44) 



The stochastic approximation basis relies on a HDMR format with a maximum interaction 
order Ni = 3 and 1-D Legendre polynomials {ipa} P a =i °f maximum degree p = 8. 
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In this section, the focus is on approximating a purely random quantity, i.e., disregarding 
its spatial dependence. We then rely on samples of the solution u (x, £) taken at a given 



5.1.1 Influence of the number of samples 

We first focus on the achieved accuracy in the approximation with a given budget N q + N q 
samples. The number of test points N q to estimate the approximation error e, Eq. (22), 
is chosen sufficiently large so that e is well estimated, N q = 10, 000. In Fig. 3, the per- 
formance of the present gLARS-ALS methodology is compared with both a plain HDMR 
approximation, i.e., with no subset selection hence considering the whole a priori approx- 
imation basis, and a Polynomial Chaos (PC) approximation with a sparse grid technique. 
The Smolyak scheme associated with a Gauss-Patterson quadrature rule is used as the 
sparse grid, with varying number of points in the 1-D quadrature rule and varying levels. 
The dimensionality of the stochastic space is d = 8. 

The sparse grid is seen to require a large number of samples to reach a given approxi- 
mation accuracy. 2 The HDMR-format approximation, with various interaction orders Ni, 
provides a better performance than PC/Smolyak but still requires more points to reach a 
given accuracy than the present methodology which performs significantly better in ap- 
proximating the Qol from a given dataset. The present gLARS-ALS approximation error 
is also seen to be smooth and monotonic when the amount of information varies. When 
N q is large enough, the subset selection step becomes useless as all |j7p r ior| terms of the 
proposal basis can be estimated from the large amount of information and the present 
gLARS-ALS performance should then be similar as that of the HDMR. This is indeed 
what is observed for N q <; 5000. Note that the benefit of a subset selection step in terms of 
accuracy improvement increases with the dimension d as the size |j7p r i r| of the potential 
dictionary then grows. 

5.1.2 Influence of the stochastic dimension 

The approximation accuracy of the present method is now studied when the dimension 
of the stochastic space varies. The same problem as above is considered but with various 
truncation orders of the source F and the diffusion coefficient v definitions, see Eqs. (20). 
The solution of the diffusion problem (44) is of dimension d = dp + d u and the dimensions 
dp and d v are varied together, dp = d v . The resulting approximation error is plotted in 
Fig. 4 for different d when the number of available samples varies. From d = 8 to d — 40, 
the required number of points for a given accuracy is seen to increase significantly, between 
a 2- and a 10-fold factor. However this is much milder than the increase in the potential 

2 Note that the plain Smolyak scheme is used here, which does not exploit anisotropy in the 
response surface. More sophisticated Smolyak-based approximations have been developed, see 
Nobile et al. (2007), and are expected to provide better results. 
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Fig. 3. Convergence of the approximation with the number of samples N q + N q . Different ap- 
proximation methods are compared: plain HDMR, PC/Smolyak scheme sparse grid spectral 
decomposition and the present gLARS-ALS. The convergence is plotted in terms of e. d = 8, 
p = 8,N l = 3, AT/ PC) = 3. 

approximation basis cardinality, i.e., if not subset selection was done, as |j7p r i r| shifts 
from 10,565 (d = 8) to 1.7 x 10 6 (d = 40), demonstrating the efficiency of the subset 
selection step which activates only a small fraction of the dictionary. When d further 
increases from 40 to 100, the performance remains essentially the same with hardly any 
loss of accuracy for a fixed N q : the solution method is able to capture the low-dimensional 
manifold onto which the solution essentially lies and an increase in the size of the solution 
space hardly affects the number of samples it requires. This capability is a crucial feature 
when available data is scarce and the solution space is very large. As an illustration, when 
d = 100, and with the parameters retained, the potential cardinality of the approximation 
basis is about 27 x 10 6 while the number of available samples is O (100 — 10, 000). It clearly 
illustrates the pivotal importance of the subset selection step. Note that if one substitutes 
a Polynomial Chaos approximation to the present HDMR format, about 352 x 10 9 terms 
need be evaluated with the present settings, a clearly daunting task. 

For sake of completeness, the approximation given by a CP-format, Eq. (13), is also 
considered. The univariate functions {/i, r } are approximated with the same polynomial 
approximation as in the present gLARS-ALS approach and a Tikhonov-based regularized 
ALS technique is used to determine each f ijT in turn given the others. Upon convergence, 
the next set of modes {/i, r +i, . . . , f±r+i\ is evaluated until a maximum rank n r set by 
cross-validation. At each rank r, the best approximation, as estimated by cross-validation, 
is retained from a set of initial conditions and regularization parameter values. As can be 
appreciated from Fig. 4, the number of samples required for a given approximation error 
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is significantly larger than with the present gLARS-HDMR method. 




Fig. 4. Convergence of the approximation with the number of samples N q + N q and for different 
dimensionality of the Qol. The present gLARS-ALS approach is compared with a CANDECOM- 
P-PARAFAC-type technique (labeled 'CP'). 

5.1.3 Subset selection 

To further illustrate the subset selection step, the set of second and third order interac- 
tion retained modes {fj} je j f t are plotted in Fig. 5 in the d = 40 case. Each bullet 
represents one of the d stochastic dimensions and each line connects two (2-nd order, left 
plot) or three (3-rd order, right plot) dimensions, denoting a retained mode. The first 
dp = 20 of the 40 dimensions are associated with the source term F in the stochastic 
equation and are represented as the solid bullets of the first two quadrants, d G [1,20]. 
The other d v = 20 dimensions are associated with the uncertain diffusion coefficient v 
and are plotted as open bullets in the 3-rd and 4-th quadrants, d G [21,40]. The dimen- 
sions introduced by these two quantities are sorted with the associated magnitude of the 
eigenvalues o~f and a v of their kernel, see Eqs. (21), which decreases along the counter- 
clockwise direction. Hence, the norm of the eigenvalues of the kernel associated with F 
decreases when one goes counter-clockwise from the first to the second quadrant. Like- 
wise, the norm of the eigenvalues associated with dimensions introduced by v decreases 
from the third to the fourth quadrant. Dominant dimensions of the stochastic space for 
the output u approximation are thus expected to lie at the beginning of the first and/or 
third quadrant. 

From the plot of second order modes (left), the subset selection process is seen to retain 
interaction modes mainly associated with dominant eigenvalues of both F and v: they 
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mainly link bullets from the first (dominant) dimensions associated with F to the first 
(dominant) dimensions associated with u, as one might expect. Further, modes associated 
with two dimensions both introduced by v are seen to be selected while two dimensions 
both associated with F are rarely connected: the subset selection procedure is able to cap- 
ture the nonlinearity associated with v in the Qol and retains corresponding interaction 
modes. Indeed, note from Eq. (44) that the source term F interacts linearly with the solu- 
tion u while the diffusion coefficient is nonlinearly coupled with u and hence, interaction 
modes between two dimensions introduced by F do not contribute to the approximation. 
The third order modes (right plot) also illustrate the nonlinearity associated with v: the 
retained modes either connect dimensions associated with v only or with one F-related 
and two //-related dimensions. Again, no two dimensions of F are connected, consistently 
with the linear dependence of u with F. These results illustrate the effectiveness of the 
procedure to unveil the dominant dependence structure and to discard unnecessary ap- 
proximation basis functions. 




Fig. 5. Graphical representation of the interaction modes retained by the subset selection pro- 
cedure. Left: second order modes are plotted as a line linking two dimensions (bullets). Right: 
third order modes are represented as 3-branch stars and connect three dimensions. 



5.1.4 Robustness 

The robustness of the approximation against measurement noise is now investigated. The 
dataset corrupted with noise, mimicking variability introduced in practice 

l. J q=l 

in measurements and coordinates estimation. Denoting the nominal value with a star as 
superscript, noise in the coordinates is modeled as 

£(9) = £(«)* + s V I < q < N g , (45) 

with s > 0. The noise is modeled as an additive ^-dimensional, zero-centered, unit vari- 
ance, Gaussian random vector £ biased so that £^ G [—1, l] d , W q. It is independent from 
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one sample q to another. Without loss of generality, measurements are here modeled as 
being corrupted with a multiplicative noise: = u^* (l + s u (^\ with s u = 0.2 and 
C -A/" (0,1). 

The evolution of the approximation accuracy when the noise intensity s in the coordinates 
varies is plotted in Fig. 6 in terms of error estimation e. We compare gLARS-ALS using 
standard least squares (LS) with its 'robust' counterpart relying on weighted total least 
squares (wTLS) as discussed in section 4.6. 

When the noise intensity increases, the error exponentially increases, quickly deteriorating 
the quality of the approximation with a noise standard deviation here as low as s = 
3 x 10 -5 . When the noise is strong (low SNR), both the LS and the wTLS methods 
achieve poor accuracy. However, if the dataset is only mildly corrupted with noise, the 
wTLS approach is seen to achieve a significantly better accuracy than the standard least 
squares. While it is useful only on a range of SNR and somehow computationally costly, 
this feature is deemed important for a successful solution method in an experimental 
context where noise is naturally present. 

0.001 



0.0001 




1e-05 



0.0001 



0.001 



Fig. 6. Robustness of the approximation w.r.t. noise in the data: approximation error e from the 
standard least squares (LS) and weighted total least squares (wTLS). d = 5, N q = 500. 



5.1.5 Scaling of the solution method 

In this section, the numerical complexity associated with the different steps of the solution 
method is illustrated in terms of computational time. Numerical experiments are carried- 
out with varying number of samples N q and solution space dimensions d. When one is 
varying, the other remains constant. The nominal parameters are d = 40 (dimension 
of the stochastic space), p = 6 (maximum total order of the Legendre polynomials), 
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N q = 1000 (number of samples 
HDMR approximation), p 
step). 



), Ni = 3 (maximum interaction order of the truncated 
5 (maximum total polynomial order in the subset selection 



Numerical results are gathered in Fig. 7. The asymptotic behavior of the number rif 
of required subset selection iterations as introduced in section 4.7 might be different 
according to which limit is considered. For the present stochastic diffusion problem, first 
and second interaction order modes tend to be selected first. Assuming the active set 
i7/,post is dominated by first and second interaction order modes, it can easily be shown 
that the number of retained groups then satisfies 



rif < 1 + rif 1 + min 



d (d 



2 q 



n fl p 



P (p + 1) 



Tif x < min 



P 



(46) 



In the present example, second order interaction groups dominate the retained set so 
that the number of retained groups tends to scale as rif oc Nq/p 2 . From Eq. (40) and for 
the present nominal parameters, it results in the following limit behavior for the subset 
selection step: 



lim ^subsel OC N q \J P riov\/p 
N„— >+oo 



lim ^/subsci oc N q 2 | Jp rior | /p 2 

a— >+oo 

2 I rr I /~2 



here 
here 



'OC 



N q 2 d Nl p Nl - 
Ng 2 d Nl p Nr 



(47) 



Similarly, the cost associated with the coefficients evaluation is considered. The number 
of interaction modes j7/ jC fr effectively varies between 1 and O (N q /p 2 ) along the solu- 
tion procedure, and, since the cost associated with solving the least squares problem 
dominates that of the matrix assembly, the cost of their evaluation finally simplifies in 
^cocf oc O (^N q 2 p) or ^cocf oc O (^N q 3 /pj depending on whether the coefficients are up- 
dated whenever an additional group is considered or not, see section 4.5 and second bullet 
of step (3) in Algorithm 2. In the present regime, the cost is found not to depend on d. 



These asymptotic behaviors are consistent with the numerical experiments as can be 
appreciated from Fig. 7. The coefficients are here updated whenever a new mode from the 
selected set is considered, hence ^cocf oc O (^N q 3 /pj. It is seen that the subset selection 
step scales less favorably than the coefficients evaluation step with the dimensionality of 
the random variable. This stresses the benefit of a carefully chosen a priori approximation 
basis to reduce as much as possible the cardinality |j7p r i or | of the potential dictionary the 
final set is determined from by the subset selection step. 
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Fig. 7. Numerical cost of the subset selection and coefficients evaluation steps as a function the 
stochastic dimension d and size of the dataset N q . Approximation coefficients are fully updated 



for each new mode. Nominal parameters are d 
5.2 Stochastic process approximation 



40, p = 6, N g = 1000, N l = 3,p = 5, N, 



(PC) 



3. 



We now consider the approximation of the space-dependent random solution u (x, £) un- 
der the form (17) using Algorithm 1. The approximation obtained from different number 
of samples ix^ q \ u^j is compared with the Karhunen-Loeve modes, computed from 
a full knowledge of the Qol, which henceforth constitute the reference solution. The sim- 
ulation relies on the following parameters: \^ x \ = 32, p = 10, d = 6, Ni = 3, N[ = 3. 
The potential approximation basis cardinality is about \J X \ ^priori — 10 5 . 

Fig. 8 shows the first and second spatial modes, W\ (x) and w 2 (x) for different sizes 
of the dataset, N q = 1000, 3000, 9000 and 26, 000. The mean mode wo (x) is virtually 
indistinguishable from the reference solution mean mode for any of the dataset sizes and 
is not plotted. On the left plot (w\ (x)), it is seen that the approximation is decent, even 
with as low as N g = 1000 samples. For N q = 3000, the approximation is good. This 
(1 + d) = 7-dimensional case corresponds to Ng 1 ^ 1 ^ ~ 3.1 samples per solution space 
dimension only and about N q f (|J7~ X .| l^priorl) — 3% of the potentially required information. 

For approximating the second spatial mode (Fig. 8, right plot), more points are needed 
to reach a good accuracy but N q = 26, 000 is seen to already deliver a good performance. 
Quantitative approximation error results are gathered in Table 2 for various separation 
ranks N and number of samples N q . 

The satisfactory performance of the present method can be understood from the upper 
part of the Karhunen-Loeve approximation (normalized) spectrum plotted in Fig. 9. The 
norm of the eigenvalues decays quickly so that the first two modes contribute more than 
90 % of the Qol L 2 -norm, showing that this problem efficiently lends itself to the present 
separation of variables-based methodology. 
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Fig. 8. First (w\ (x), left) and second (w2 (x), right) spatial approximation modes of the stochastic 
diffusion solution. The reference (Karhunen-Loeve) solution is plotted for comparison (thick line). 
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Table 2 

Evolution of the approximation error e with the decomposition rank ./V and the number of 
samples N q . 
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Fig. 9. Normalized upper spectrum of the Karhunen-Loeve approximation. 
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5.3 A Shallow Water flow example 



The methodology is now applied to the approximation of the stochastic solution of a 
Shallow Water flow simulation with multiple sources of uncertainty. It is a simple model 
for the simulation of wave propagation on the ocean surface. Waves are here produced by 
the sudden displacement of the sea bottom at a given magnitude, extension and location, 
all uncertain. Further, the propagation of the waves also depends on the topography of 
the ocean floor, which is also uncertain. 



5.3.1 Model 



The problem is governed by the following set of equations: 



-nT = fcV2-9-^- i -bv 1 + S Vl , (48) 

-^i = -fcvi-9-^-bv 2 + S V2 , (49) 

dh _ d{ Vl (H + h)) _ d{v 2 (H + h)) 
dt dxi dx 2 



+ S h , (50) 



where (v\ (x, £, t) v 2 (x, £, t)) is the velocity vector at the surface, x = (x\ x 2 ) G Q C M 2 , 
h(x,£,t) the elevation of the surface from its position at rest, H(x,£) the sea depth, 
fc models the Coriolis force, b is the viscous drag coefficient, g the gravity constant and 
S Vl (x,£,t), S V2 (x,£,t), Sh(x,£,t) are the source fields. Without loss of generality, the 
drag b and the Coriolis force fc are neglected. No slip boundary conditions apply for the 
velocity. The sources are modeled as acting on h only, S Vl = and S V2 = 0. Sh models 
the source term acting on h due to, say, an underwater seismic event. The fluid density 
and the free surface pressure are implicitly assumed constant. 



5.3.2 Sources of uncertainty 

Letting £ = ($'£"), the ocean floor topography H(x,£) is described with a iV^-term 
expansion: 

h (x, eo = H( X )+j2 v^e: w *>? (*), (51) 

i=l 

with £'=(£{■•■ £'nh) ^ e stochastic germ associated to the uncertainty in H. Random 

variables {£^} i= ^ are iid, uniformly distributed. The source Sh is also uncertain and is 
modeled as a time-dependent, spatially distributed, quantity: 

S, (x, r, t) = a t (t) a ( (O exp ( J*-**l®) T £-*«l®) ) , (52) 

V <rs h (^2 ) / 
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where a t (t) is a given time envelop, (£") the uncertain source magnitude, o~s h (£2) drives 
the uncertain source spatial extension and x$ h (£3) is the uncertain spatial location. 

The solution of the Shallow Water problem then lies in a (d = Nh + 3)-dimensional 
stochastic space. Full details on the problem and the numerical implementation of the 
simulation are given in Mathelin et al. (2011). 



5.3.3 Approximation from an available database 

As an illustration of the methodology, we aim at approximating the sea surface field 
at a fixed amount of time t* after a seismic event. The Qol is then a random field 
u (x, £) = h (x, £, £*). An accurate description of this field is of importance for emergency 
plans in case of a seaquake. Sea level measurements of the surface at various spatial loca- 
tions from past events constitute the experimental dataset ^x^ q \^ q \ h (x^ q \ ^ q \t*^j j * 
used to derive an approximation of u under a separated form: u(x,£) ~ (u) N (x) + 
Y%=xWn{x) A n (£). 

The solution method here relies on a N q = 37, 000-sample dataset complemented with 
N q = 5000 cross-validation samples and a N q — 5000 set for error estimation. We consider 
a Nh = 5 expansion for the topography, leading to a stochastic dimension of d — 5 + 
3 = 8. The effective number of samples per dimension is then about ]V (? 1 /( d ^+ d ) ~ 2.9. 
The approximation is determined based on a \J X \ = 484 spatial discretization DOFs 
(spectral elements) at the deterministic level and p = 6-th order Legendre polynomials 
{^o-}) Ni = 3, Ni ?c ^ = 3, for the stochastic modes. The cardinality of this a priori basis 
is then \J~ X \ |j7p r i r| — 770 x 10 3 ^> N q , again relying on an efficient subset selection step 
to make the approximation problem well-posed. 

The approximation error when the rank N varies is shown in Table 3. It is seen that 
estimating the mean spatial mode wq leads to a relative error of about 0.12 while adding 
the first (wi, Ai) and second (w 2 , A 2 ) pair drops it to about 0.05. Further adding pairs does 
not lower the approximation error with this dataset and more samples would be needed 
to accurately estimate them. Spatial modes w and w± of the separated approximation 
are plotted in Fig. 10 for illustration. 



N 
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0.117 


0.056 


0.046 


0.044 



Table 3 

Relative approximation error e evolution with the decomposition rank N. N q = 37, 000. 
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Fig. 10. Mean (wq (x) = {u) N (x), left) and first (wi (x), right) spatial modes. 
6 Conclusion 

In this paper, a methodology was put together for deriving a functional representation of 
a random process only known through a collection of its pointwise evaluations. The pro- 
posed method essentially relies on an efficient determination of an approximation basis 
consistent with the available information. This involves the choice of an a priori canonical 
HDMR format combined with tuning the basis via a data-driven subset selection step. 
This subset selection is carried-out in a bottom-to-top manner, as opposed to a top-to- 
bottom manner as done in the Compressed Sensing standard framework. It essentially 
sorts the approximation HDMR modes (groups of predictors) by their contribution in 
approximating the Quantity of Interest. The final approximation can rely on a different 
functional description of the modes, typically of higher order and/or nonlinear in the coef- 
ficients. This allows to take advantage of both a computationally efficient subset selection 
process, allowing for a relevant basis structure given the data, and a parsimonious final 
approximation basis relying on a fewer number of coefficients hence making the most out 
of the data. 

The method is progressive and data-driven. Its efficiency was demonstrated on two exam- 
ples which have shown its ability to achieve a good approximation accuracy from a small 
dataset, as long as the quantity at hand is essentially lying on a low-dimensional manifold. 
In particular, the dominant dimensions are naturally revealed so that all the available in- 
formation can be dedicated to approximate relevant dependences only. Through a total 
least squares approach, it was also shown that some robustness can be achieved, an im- 
portant feature when dealing with experimental data. Using a robust approximation was 
shown to bring up to a 2-fold improvement upon the approximation error using standard 
least squares, but at the price of a computational overhead. The global solution method 
scales reasonably well, exhibiting a linear dependence with the cardinality of the pro- 
posal basis dictionary and a quadratic or cubic dependence with the number of samples, 
depending on the coefficients update strategy. 
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The present work was focused on a general methodology, disregarding fine-tuning aspects. 
Among other things, a natural improvement would be to carry-out a predictor selection 
within each retained groups j/ygjy poBt }, further lowering the number of coefficients in- 
volved in the approximation and hence allowing, say, higher polynomial orders to be used 
with the same amount of data. Further, the tensor structure of the Hilbert stochastic space 
can be exploited and developments towards a data-driven multilinear algebra effective tool 
for high-dimensional uncertainty quantification are currently carried-out. 
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